{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第7篇：分组统计\n",
    "分组是我们日常处理和分析数据中经常会使用的操作技巧，比如现在有一个学校某一门科目所有的学生的考试成绩，要以班级为单位进行成绩排序，那么首先需要对所有学生成绩进行班级分组，即分别对每个班的学生成绩进行平均值（求和也可以）计算，最后再按成绩高低对班级进行排序。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第一部分：分组过程理解\n",
    "\n",
    "### 1. 分组过程\n",
    "分组函数在pandas中是groupby,其实质是split-apply-combine过程，简称SAC。**分组**是指涉及以下一个或多个步骤的过程：\n",
    "- 拆分数据到基于某些标准组。\n",
    "- 将功能独立地应用于每个组。\n",
    "- 将结果合并为数据结构。\n",
    "其中，拆分步骤是最简单的。实际上，在许多情况下，我们可能希望将数据集分成几组，然后对这些组进行处理。\n",
    "\n",
    "### 2. 分组应用\n",
    "在应用步骤中，我们可能希望执行以下操作之一：\n",
    "- 聚合（Aggregation）：为每个组计算摘要统计量。一些例子：\n",
    "    - 计算组总和或均值。\n",
    "    - 计算小组人数/人数。\n",
    "- 转换（Transformation）：执行一些特定于组的计算并返回索引相似的对象。一些例子：\n",
    "    - 标准化组内的数据（zscore）。\n",
    "    - 用从每个组派生的值填充组内的NA。\n",
    "- 过滤（Filtration）：根据评估为True或False的逐组计算丢弃一些组。一些例子：\n",
    "    - 丢弃属于只有几个成员的组的数据。\n",
    "    - 根据组总和或均值过滤数据。\n",
    "\n",
    "以上各项的某种组合：GroupBy将检查apply步骤的结果，并在不适合上述两种类别的情况下尝试返回明智的组合结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第二部分：分组函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "导入相关库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>male</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>female</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>female</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>30.0</td>\n",
       "      <td></td>\n",
       "      <td>female</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>37.0</td>\n",
       "      <td>克利夫兰</td>\n",
       "      <td>male</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.0</td>\n",
       "      <td>晋城</td>\n",
       "      <td>male</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age  city     sex  income\n",
       "name                             \n",
       "Tom    18.0    北京    male    3000\n",
       "Bob    30.0    上海    male    8000\n",
       "Mary   35.0    广州  female    8000\n",
       "James  18.0    深圳    male    4000\n",
       "Andy    NaN   NaN  female    6000\n",
       "Alice  30.0        female    7000\n",
       "Kobe   37.0  克利夫兰    male   10000\n",
       "Yafei  25.0    晋城    male   70000"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index = pd.Index(data=[\"Tom\", \"Bob\", \"Mary\", \"James\", \"Andy\", \"Alice\", 'Kobe','Yafei'], name=\"name\")\n",
    "data = {\n",
    "    \"age\": [18, 30, 35, 18, np.nan, 30, 37, 25],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", np.nan, \" \", \"克利夫兰\", \"晋城\"],\n",
    "    \"sex\": [\"male\", \"male\", \"female\", \"male\", \"female\", \"female\", \"male\", \"male\"],\n",
    "    \"income\": [3000, 8000, 8000, 4000, 6000, 7000, 10000, 70000]\n",
    "}\n",
    "user_info = pd.DataFrame(data=data, index=index)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. groupby\n",
    ">  groupby(by=None,axis=0,level=None,as_index: bool = True, sort: bool = True,group_keys: bool = True,squeeze: bool = no_default,observed: bool = False,dropna: bool = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 根据单列分组：将user_info按性别分组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001EA12D09FA0>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group = user_info.groupby(by='sex')\n",
    "user_info_sex_group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'female': ['Mary', 'Andy', 'Alice'], 'male': ['Tom', 'Bob', 'James', 'Kobe', 'Yafei']}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>male</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>37.0</td>\n",
       "      <td>克利夫兰</td>\n",
       "      <td>male</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.0</td>\n",
       "      <td>晋城</td>\n",
       "      <td>male</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age  city   sex  income\n",
       "name                           \n",
       "Tom    18.0    北京  male    3000\n",
       "Bob    30.0    上海  male    8000\n",
       "James  18.0    深圳  male    4000\n",
       "Kobe   37.0  克利夫兰  male   10000\n",
       "Yafei  25.0    晋城  male   70000"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.get_group('male')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>female</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>female</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>30.0</td>\n",
       "      <td></td>\n",
       "      <td>female</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city     sex  income\n",
       "name                            \n",
       "Mary   35.0   广州  female    8000\n",
       "Andy    NaN  NaN  female    6000\n",
       "Alice  30.0       female    7000"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.get_group('female')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 根据多列分组：将user_info按照性别和年龄分组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_info_group = user_info.groupby(by=['sex', 'age'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('female', 35.0): ['Mary'], ('female', nan): ['Andy'], ('female', 30.0): ['Alice'], ('male', 18.0): ['Tom', 'James'], ('male', 25.0): ['Yafei'], ('male', 30.0): ['Bob'], ('male', 37.0): ['Kobe']}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city   sex  income\n",
       "name                          \n",
       "Tom    18.0   北京  male    3000\n",
       "James  18.0   深圳  male    4000"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.get_group((\"male\", 18))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 关闭排序\n",
    "默认情况下，groupby 会在操作过程中对数据进行排序。如果为了更好的性能，可以设置 sort=False。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('female', 35.0): ['Mary'], ('female', nan): ['Andy'], ('female', 30.0): ['Alice'], ('male', 18.0): ['Tom', 'James'], ('male', 25.0): ['Yafei'], ('male', 30.0): ['Bob'], ('male', 37.0): ['Kobe']}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group = user_info.groupby(by=['sex', 'age'], sort=False)\n",
    "user_info_group.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### level参数（用于多级索引）和axis参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{18.0: [('male', 18.0), ('male', 18.0)], 30.0: [('male', 30.0), ('female', 30.0)], 35.0: [('female', 35.0)], 37.0: [('male', 37.0)], 25.0: [('male', 25.0)]}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.set_index(['sex', 'age']).groupby(level=1, sort=False).groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 对象属性"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看分组前n行：head\n",
    "对分组对象使用head函数，返回的是每个组的前几行，而不是数据集前几行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>male</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>female</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>female</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city     sex  income\n",
       "name                           \n",
       "Tom   18.0   北京    male    3000\n",
       "Bob   30.0   上海    male    8000\n",
       "Mary  35.0   广州  female    8000\n",
       "Andy   NaN  NaN  female    6000"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看分组第1行：first\n",
    "first显示的是以分组为索引的每组的第一个分组信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age city  income\n",
       "sex                      \n",
       "female  35.0   广州    8000\n",
       "male    18.0   北京    3000"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.first()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看指定列\n",
    "在使用 groupby 进行分组后，可以使用点或切片 [] 操作来完成对某一列的选择"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom        北京\n",
       "Bob        上海\n",
       "Mary       广州\n",
       "James      深圳\n",
       "Andy      NaN\n",
       "Alice        \n",
       "Kobe     克利夫兰\n",
       "Yafei      晋城\n",
       "Name: city, dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.city.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom        北京\n",
       "Bob        上海\n",
       "Mary       广州\n",
       "James      深圳\n",
       "Andy      NaN\n",
       "Alice        \n",
       "Kobe     克利夫兰\n",
       "Yafei      晋城\n",
       "Name: city, dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group['city'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>北京</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>上海</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>广州</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>深圳</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td></td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>克利夫兰</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>晋城</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       city  income\n",
       "name               \n",
       "Tom      北京    3000\n",
       "Bob      上海    8000\n",
       "Mary     广州    8000\n",
       "James    深圳    4000\n",
       "Andy    NaN    6000\n",
       "Alice          7000\n",
       "Kobe   克利夫兰   10000\n",
       "Yafei    晋城   70000"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group[['city', 'income']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看指定列的简单统计信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>32.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>25.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age\n",
       "sex         \n",
       "female  32.5\n",
       "male    25.6"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group[['age']].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>128.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          age\n",
       "sex          \n",
       "female   65.0\n",
       "male    128.0"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group[['age']].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 组个数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.ngroups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 组大小/容量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    3\n",
       "male      5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 组索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'female': ['Mary', 'Andy', 'Alice'], 'male': ['Tom', 'Bob', 'James', 'Kobe', 'Yafei']}"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.  遍历分组"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在对数据进行分组后，可以进行遍历。如果是根据多个字段来分组的，每个组的名称是一个元组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "female\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>female</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>female</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>30.0</td>\n",
       "      <td></td>\n",
       "      <td>female</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city     sex  income\n",
       "name                            \n",
       "Mary   35.0   广州  female    8000\n",
       "Andy    NaN  NaN  female    6000\n",
       "Alice  30.0       female    7000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "male\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>male</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>37.0</td>\n",
       "      <td>克利夫兰</td>\n",
       "      <td>male</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.0</td>\n",
       "      <td>晋城</td>\n",
       "      <td>male</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age  city   sex  income\n",
       "name                           \n",
       "Tom    18.0    北京  male    3000\n",
       "Bob    30.0    上海  male    8000\n",
       "James  18.0    深圳  male    4000\n",
       "Kobe   37.0  克利夫兰  male   10000\n",
       "Yafei  25.0    晋城  male   70000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "for name, group in user_info_sex_group:\n",
    "    print(name)\n",
    "    display(group)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('male', 18.0)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city   sex  income\n",
       "name                          \n",
       "Tom    18.0   北京  male    3000\n",
       "James  18.0   深圳  male    4000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('male', 30.0)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>male</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city   sex  income\n",
       "name                         \n",
       "Bob   30.0   上海  male    8000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('female', 35.0)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>female</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city     sex  income\n",
       "name                           \n",
       "Mary  35.0   广州  female    8000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('female', 30.0)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>30.0</td>\n",
       "      <td></td>\n",
       "      <td>female</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city     sex  income\n",
       "name                            \n",
       "Alice  30.0       female    7000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('male', 37.0)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>37.0</td>\n",
       "      <td>克利夫兰</td>\n",
       "      <td>male</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age  city   sex  income\n",
       "name                          \n",
       "Kobe  37.0  克利夫兰  male   10000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('male', 25.0)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.0</td>\n",
       "      <td>晋城</td>\n",
       "      <td>male</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city   sex  income\n",
       "name                          \n",
       "Yafei  25.0   晋城  male   70000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "for name, group in user_info_group:\n",
    "    print(name)\n",
    "    display(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. 选择组元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>37.0</td>\n",
       "      <td>克利夫兰</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.0</td>\n",
       "      <td>晋城</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age  city  income\n",
       "name                     \n",
       "Tom    18.0    北京    3000\n",
       "Bob    30.0    上海    8000\n",
       "James  18.0    深圳    4000\n",
       "Kobe   37.0  克利夫兰   10000\n",
       "Yafei  25.0    晋城   70000"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.get_group(\"male\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city   sex  income\n",
       "name                          \n",
       "Tom    18.0   北京  male    3000\n",
       "James  18.0   深圳  male    4000"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.get_group((\"male\", 18))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. 分组函数的方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['age', 'agg', 'aggregate', 'all', 'any', 'apply', 'backfill', 'bfill', 'boxplot', 'city', 'corr', 'corrwith', 'count', 'cov', 'cumcount', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'dtypes', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'get_group', 'groups', 'head', 'hist', 'idxmax', 'idxmin', 'income', 'indices', 'last', 'mad', 'max', 'mean', 'median', 'min', 'ndim', 'ngroup', 'ngroups', 'nth', 'nunique', 'ohlc', 'pad', 'pct_change', 'pipe', 'plot', 'prod', 'quantile', 'rank', 'resample', 'rolling', 'sample', 'sem', 'sex', 'shift', 'size', 'skew', 'std', 'sum', 'tail', 'take', 'transform', 'tshift', 'var']\n"
     ]
    }
   ],
   "source": [
    "print([attr for attr in dir(user_info_group) if not attr.startswith('_')])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>32.5</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>25.0</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age  income\n",
       "sex                 \n",
       "female  32.5    7000\n",
       "male    25.0    8000"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>35.0</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>37.0</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age  income\n",
       "sex                 \n",
       "female  35.0    8000\n",
       "male    37.0   70000"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "由此可见，groupby对象可以使用相当多的函数，灵活程度很高"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. 连续性变量分组\n",
    "**分布分析-cut**\n",
    "> pd.cut(data['col_names'], bins, labels=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "年龄分层\n",
       "20岁及以下     2\n",
       "21岁到30岁    3\n",
       "31岁到40岁    2\n",
       "41岁以上      0\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bins = [0,20,30,40,100]\n",
    "labels = ['20岁及以下','21岁到30岁','31岁到40岁','41岁以上']\n",
    "user_info['年龄分层'] = pd.cut(user_info['age'],bins=bins, labels=labels) #可选label添加自定义标签\n",
    "user_info.groupby(by=['年龄分层'])['age'].count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第三部分：聚合、转换和过滤"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 聚合（Aggregation）\n",
    "分组的目的是为了统计，统计的时候需要聚合，所以我们需要在分完组后来看下如何进行聚合。常见的一些聚合操作有：计数、求和、最大值、最小值、平均值等。想要实现聚合操作，一种方式就是调用 agg 方法。　"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 常用聚合函数\n",
    "常用的聚合函数包含：mean/sum/size/count/std/var/sem/describe/first/last/nth/min/max等等，下面看一下示例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "      <th>年龄分层</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "      <td>20岁及以下</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>male</td>\n",
       "      <td>8000</td>\n",
       "      <td>21岁到30岁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>female</td>\n",
       "      <td>8000</td>\n",
       "      <td>31岁到40岁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "      <td>20岁及以下</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>female</td>\n",
       "      <td>6000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>30.0</td>\n",
       "      <td></td>\n",
       "      <td>female</td>\n",
       "      <td>7000</td>\n",
       "      <td>21岁到30岁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>37.0</td>\n",
       "      <td>克利夫兰</td>\n",
       "      <td>male</td>\n",
       "      <td>10000</td>\n",
       "      <td>31岁到40岁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.0</td>\n",
       "      <td>晋城</td>\n",
       "      <td>male</td>\n",
       "      <td>70000</td>\n",
       "      <td>21岁到30岁</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age  city     sex  income     年龄分层\n",
       "name                                      \n",
       "Tom    18.0    北京    male    3000   20岁及以下\n",
       "Bob    30.0    上海    male    8000  21岁到30岁\n",
       "Mary   35.0    广州  female    8000  31岁到40岁\n",
       "James  18.0    深圳    male    4000   20岁及以下\n",
       "Andy    NaN   NaN  female    6000      NaN\n",
       "Alice  30.0        female    7000  21岁到30岁\n",
       "Kobe   37.0  克利夫兰    male   10000  31岁到40岁\n",
       "Yafei  25.0    晋城    male   70000  21岁到30岁"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_info_sex_group = user_info.groupby('sex')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "获取不同性别下所包含的人数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    3.0\n",
       "male      5.0\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group['age'].agg(len)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    2\n",
       "male      5\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    3\n",
       "male      5\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "获取不同性别年龄下的各指标数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_info_group = user_info.groupby(by=['sex', 'age'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "      <th>年龄分层</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">female</th>\n",
       "      <th>30.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">male</th>\n",
       "      <th>18.0</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city  income  年龄分层\n",
       "sex    age                     \n",
       "female 30.0     1       1     1\n",
       "       35.0     1       1     1\n",
       "male   18.0     2       2     2\n",
       "       25.0     1       1     1\n",
       "       30.0     1       1     1\n",
       "       37.0     1       1     1"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.agg(len)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "      <th>年龄分层</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">female</th>\n",
       "      <th>30.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">male</th>\n",
       "      <th>18.0</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city  income  年龄分层\n",
       "sex    age                     \n",
       "female 30.0     1       1     1\n",
       "       35.0     1       1     1\n",
       "male   18.0     2       2     2\n",
       "       25.0     1       1     1\n",
       "       30.0     1       1     1\n",
       "       37.0     1       1     1"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex     age \n",
       "female  30.0    1\n",
       "        35.0    1\n",
       "male    18.0    2\n",
       "        25.0    1\n",
       "        30.0    1\n",
       "        37.0    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "获取不同性别下包含的最大的年龄"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    35.0\n",
       "male      37.0\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.agg(max)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    35.0\n",
       "male      37.0\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.agg(np.max)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    35.0\n",
       "male      37.0\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series 和 DataFrame 都包含了 describe 方法，我们分组后一样可以使用 describe 方法来查看数据的情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">age</th>\n",
       "      <th colspan=\"8\" halign=\"left\">income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>2.0</td>\n",
       "      <td>32.5</td>\n",
       "      <td>3.535534</td>\n",
       "      <td>30.0</td>\n",
       "      <td>31.25</td>\n",
       "      <td>32.5</td>\n",
       "      <td>33.75</td>\n",
       "      <td>35.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>1000.000000</td>\n",
       "      <td>6000.0</td>\n",
       "      <td>6500.0</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>7500.0</td>\n",
       "      <td>8000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>5.0</td>\n",
       "      <td>25.6</td>\n",
       "      <td>8.142481</td>\n",
       "      <td>18.0</td>\n",
       "      <td>18.00</td>\n",
       "      <td>25.0</td>\n",
       "      <td>30.00</td>\n",
       "      <td>37.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>19000.0</td>\n",
       "      <td>28653.097564</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>70000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age                                                 income           \\\n",
       "       count  mean       std   min    25%   50%    75%   max  count     mean   \n",
       "sex                                                                            \n",
       "female   2.0  32.5  3.535534  30.0  31.25  32.5  33.75  35.0    3.0   7000.0   \n",
       "male     5.0  25.6  8.142481  18.0  18.00  25.0  30.00  37.0    5.0  19000.0   \n",
       "\n",
       "                                                                \n",
       "                 std     min     25%     50%      75%      max  \n",
       "sex                                                             \n",
       "female   1000.000000  6000.0  6500.0  7000.0   7500.0   8000.0  \n",
       "male    28653.097564  3000.0  4000.0  8000.0  10000.0  70000.0  "
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">female</th>\n",
       "      <th>30.0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>7000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35.0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">male</th>\n",
       "      <th>18.0</th>\n",
       "      <td>2.0</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>707.106781</td>\n",
       "      <td>3000.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>4000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25.0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>70000.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>70000.0</td>\n",
       "      <td>70000.0</td>\n",
       "      <td>70000.0</td>\n",
       "      <td>70000.0</td>\n",
       "      <td>70000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30.0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>8000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37.0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            income                                                           \\\n",
       "             count     mean         std      min      25%      50%      75%   \n",
       "sex    age                                                                    \n",
       "female 30.0    1.0   7000.0         NaN   7000.0   7000.0   7000.0   7000.0   \n",
       "       35.0    1.0   8000.0         NaN   8000.0   8000.0   8000.0   8000.0   \n",
       "male   18.0    2.0   3500.0  707.106781   3000.0   3250.0   3500.0   3750.0   \n",
       "       25.0    1.0  70000.0         NaN  70000.0  70000.0  70000.0  70000.0   \n",
       "       30.0    1.0   8000.0         NaN   8000.0   8000.0   8000.0   8000.0   \n",
       "       37.0    1.0  10000.0         NaN  10000.0  10000.0  10000.0  10000.0   \n",
       "\n",
       "                      \n",
       "                 max  \n",
       "sex    age            \n",
       "female 30.0   7000.0  \n",
       "       35.0   8000.0  \n",
       "male   18.0   4000.0  \n",
       "       25.0  70000.0  \n",
       "       30.0   8000.0  \n",
       "       37.0  10000.0  "
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 避免多层索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果是根据多个键来进行聚合，默认情况下得到的结果是一个多层索引结构。有两种方式可以避免出现多层索引，先来介绍第一种。对包含多层索引的对象调用 reset_index 方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 方式一"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "      <th>年龄分层</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">female</th>\n",
       "      <th>30.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">male</th>\n",
       "      <th>18.0</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city  income  年龄分层\n",
       "sex    age                     \n",
       "female 30.0     1       1     1\n",
       "       35.0     1       1     1\n",
       "male   18.0     2       2     2\n",
       "       25.0     1       1     1\n",
       "       30.0     1       1     1\n",
       "       37.0     1       1     1"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.agg(len)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "      <th>年龄分层</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>female</td>\n",
       "      <td>30.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>male</td>\n",
       "      <td>18.0</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>male</td>\n",
       "      <td>25.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>male</td>\n",
       "      <td>30.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>male</td>\n",
       "      <td>37.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      sex   age  city  income  年龄分层\n",
       "0  female  30.0     1       1     1\n",
       "1  female  35.0     1       1     1\n",
       "2    male  18.0     2       2     2\n",
       "3    male  25.0     1       1     1\n",
       "4    male  30.0     1       1     1\n",
       "5    male  37.0     1       1     1"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_group.agg(len).reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 方式二\n",
    "另外一种方式是在分组时，设置参数 as_index=False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>income</th>\n",
       "      <th>年龄分层</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>female</td>\n",
       "      <td>30.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>male</td>\n",
       "      <td>18.0</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>male</td>\n",
       "      <td>25.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>male</td>\n",
       "      <td>30.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>male</td>\n",
       "      <td>37.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      sex   age  city  income  年龄分层\n",
       "0  female  30.0     1       1     1\n",
       "1  female  35.0     1       1     1\n",
       "2    male  18.0     2       2     2\n",
       "3    male  25.0     1       1     1\n",
       "4    male  30.0     1       1     1\n",
       "5    male  37.0     1       1     1"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.groupby(['sex', 'age'], as_index=False).agg(len)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 应用多个聚合操作\n",
    "有时候进行分组后，不单单想得到一个统计结果，有可能是多个。比如想统计出不同性别下的一个收入的总和和平均值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sum</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>21000</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>95000</td>\n",
       "      <td>19000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          sum   mean\n",
       "sex                 \n",
       "female  21000   7000\n",
       "male    95000  19000"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group[\"income\"].agg([np.sum, np.mean])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 利用元组重命名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>和</th>\n",
       "      <th>平均值</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>21000</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>95000</td>\n",
       "      <td>19000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            和    平均值\n",
       "sex                 \n",
       "female  21000   7000\n",
       "male    95000  19000"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group[\"income\"].agg([('和', np.sum), ('平均值', np.mean)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 对DataFrame列应用不同的聚合操作\n",
    "有时候可能需要对不同的列使用不同的聚合操作。例如，想要统计不同性别下人群的年龄的均值以及收入的总和。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age_mean</th>\n",
       "      <th>income_sum</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>32.5</td>\n",
       "      <td>21000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>25.6</td>\n",
       "      <td>95000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age_mean  income_sum\n",
       "sex                         \n",
       "female      32.5       21000\n",
       "male        25.6       95000"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.agg({\"age\": np.mean, \"income\": np.sum}).rename(columns={\"age\": \"age_mean\", \"income\": \"income_sum\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 自定义函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    33.5\n",
       "male      26.6\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.agg(lambda x: x.mean() + 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female     5.0\n",
       "male      19.0\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.agg(lambda x: x.max() - x.min())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 利用NamedAgg函数进行多个聚合\n",
    "注意：不支持lambda函数，但是可以使用外置的def函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>min_income1</th>\n",
       "      <th>max_income</th>\n",
       "      <th>range_income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>2000</td>\n",
       "      <td>8000</td>\n",
       "      <td>1000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>67000</td>\n",
       "      <td>70000</td>\n",
       "      <td>62000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        min_income1  max_income  range_income\n",
       "sex                                          \n",
       "female         2000        8000          1000\n",
       "male          67000       70000         62000"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def R1(x):\n",
    "    return x.max()-x.min()\n",
    "def R2(x):\n",
    "    return x.max()-x.median()\n",
    "user_info_sex_group.agg(min_income1=pd.NamedAgg(column='income', aggfunc=R1),\n",
    "                           max_income=pd.NamedAgg(column='income', aggfunc='max'),\n",
    "                           range_income=pd.NamedAgg(column='income', aggfunc=R2)).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 带参数的聚合函数\n",
    "判断是否组内年龄至少有一个值在30-40之间："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    True\n",
       "male      True\n",
       "Name: age, dtype: bool"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def f(s,low,high):\n",
    "    return s.between(low,high).max()\n",
    "user_info_sex_group['age'].agg(f,30,40)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果需要使用多个函数，并且其中至少有一个带参数，则使用wrap技巧："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>func_name</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>True</td>\n",
       "      <td>32.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>True</td>\n",
       "      <td>25.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        func_name  mean\n",
       "sex                    \n",
       "female       True  32.5\n",
       "male         True  25.6"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def f_test(s,low,high):\n",
    "    return s.between(low,high).max()\n",
    "def agg_f(f_mul,name,*args):\n",
    "    def wrapper(x):\n",
    "        return f_mul(x,*args)\n",
    "    wrapper.__name__ = name\n",
    "    return wrapper\n",
    "user_info_sex_group['age'].agg([agg_f(f_test,'func_name',30,40),'mean'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 过滤（Filteration）\n",
    "filter函数是用来筛选某些组的（务必记住结果是组的全体），因此传入的值应当是布尔标量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>income</th>\n",
       "      <th>年龄分层</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>male</td>\n",
       "      <td>3000</td>\n",
       "      <td>20岁及以下</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>male</td>\n",
       "      <td>8000</td>\n",
       "      <td>21岁到30岁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>female</td>\n",
       "      <td>8000</td>\n",
       "      <td>31岁到40岁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>male</td>\n",
       "      <td>4000</td>\n",
       "      <td>20岁及以下</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>female</td>\n",
       "      <td>6000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city     sex  income     年龄分层\n",
       "name                                     \n",
       "Tom    18.0   北京    male    3000   20岁及以下\n",
       "Bob    30.0   上海    male    8000  21岁到30岁\n",
       "Mary   35.0   广州  female    8000  31岁到40岁\n",
       "James  18.0   深圳    male    4000   20岁及以下\n",
       "Andy    NaN  NaN  female    6000      NaN"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.filter(lambda x: (x['income'] > 32).all()).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. 变换（Transformation）\n",
    "前面进行聚合运算的时候，得到的结果是一个以分组名作为索引的结果对象。虽然可以指定 as_index=False ,但是得到的索引也并不是元数据的索引。如果我们想使用原数组的索引的话，就需要进行 merge 转换。\n",
    "transform方法简化了这个过程，它会把 func 参数应用到所有分组，然后把结果放置到原数组的索引上（如果结果是一个标量，就进行广播）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 组内元素变换\n",
    "transform函数中传入的对象是组内的列，并且返回值需要与列长完全一致"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female     7000\n",
       "male      19000\n",
       "Name: income, dtype: int64"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.income.agg(np.mean)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      19000\n",
       "Bob      19000\n",
       "Mary      7000\n",
       "James    19000\n",
       "Andy      7000\n",
       "Alice     7000\n",
       "Kobe     19000\n",
       "Yafei    19000\n",
       "Name: income, dtype: int64"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.income.transform(np.mean)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>25.6</td>\n",
       "      <td>19000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>25.6</td>\n",
       "      <td>19000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>32.5</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>25.6</td>\n",
       "      <td>19000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>32.5</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>32.5</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>25.6</td>\n",
       "      <td>19000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.6</td>\n",
       "      <td>19000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age  income\n",
       "name               \n",
       "Tom    25.6   19000\n",
       "Bob    25.6   19000\n",
       "Mary   32.5    7000\n",
       "James  25.6   19000\n",
       "Andy   32.5    7000\n",
       "Alice  32.5    7000\n",
       "Kobe   25.6   19000\n",
       "Yafei  25.6   19000"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group[['age', 'income']].transform(np.mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 利用变换方法进行组内标准化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom     -0.558404\n",
       "Bob     -0.383903\n",
       "Mary     1.000000\n",
       "James   -0.523504\n",
       "Andy    -1.000000\n",
       "Name: income, dtype: float64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group['income'].transform(lambda x:(x-x.mean())/x.std()).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 利用变换方法进行组内缺失值的均值填充"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    32.5\n",
       "male      25.6\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.age.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>3000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>35.0</td>\n",
       "      <td>8000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>18.0</td>\n",
       "      <td>4000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>32.5</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>30.0</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>37.0</td>\n",
       "      <td>10000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25.0</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age  income\n",
       "name               \n",
       "Tom    18.0    3000\n",
       "Bob    30.0    8000\n",
       "Mary   35.0    8000\n",
       "James  18.0    4000\n",
       "Andy   32.5    6000\n",
       "Alice  30.0    7000\n",
       "Kobe   37.0   10000\n",
       "Yafei  25.0   70000"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.transform(lambda x: x.fillna(x.mean()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第四部分：apply函数\n",
    "\n",
    "除了 上述聚合、过滤和变换操作外，还有更神奇的 apply 操作。  \n",
    "　　apply 会将待处理的对象拆分成多个片段，然后对各片段调用传入的函数，最后尝试用 pd.concat() 把结果组合起来。func 的返回值可以是 Pandas 对象或标量，并且数组对象的大小不限。  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### transform与apply的比较\n",
    "**相同点**\n",
    "- 都能针对dataframe完成特征的计算，并且常常与groupby()方法一起使用。\n",
    "\n",
    "**不同点**\n",
    "\n",
    "- apply()里面可以跟自定义的函数，包括简单的求和函数以及复杂的特征间的差值函数等\n",
    "\n",
    "- transform() 里面不能跟自定义的特征交互函数，因为transform是针对每一元素（即每一列特征操作）进行计算，也就是说在使用 transform() 方法时，需要记得三点：  \n",
    "    1、它只能对每一列进行计算，所以在groupby()之后，.transform()之前是要指定要操作的列，这点也与apply有很大的不同。\n",
    "\n",
    "    2、由于是只能对每一列计算，所以方法的通用性相比apply()就局限了很多，例如只能求列的最大/最小/均值/方差/分箱等操作\n",
    "\n",
    "    3、transform还有什么用呢?最简单的情况是试图将函数的结果分配回原始的dataframe。也就是说返回的shape是（len(df)，1）。注：如果与groupby()方法联合使用，需要对值进行去重"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### apply的使用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**apply函数的灵活性**  \n",
    "> 可能在所有的分组函数中，apply是应用最为广泛的，这得益于它的灵活性，apply函数的灵活性很大程度来源于其返回值的多样性："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 标量返回值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>32.5</td>\n",
       "      <td>7000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>25.6</td>\n",
       "      <td>19000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age   income\n",
       "sex                  \n",
       "female  32.5   7000.0\n",
       "male    25.6  19000.0"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.apply(np.mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "比如想要统计不同性别最高收入的前n个值，可以通过下面这种方式实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female      [8000, 7000]\n",
       "male      [70000, 10000]\n",
       "Name: income, dtype: object"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def f1(ser, num=2):\n",
    "    return ser.nlargest(num).tolist()\n",
    "user_info_sex_group[\"income\"].apply(f1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另外，如果想要获取不同性别下的年龄的均值，通过 apply 可以如下实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    32.5\n",
       "male      25.6\n",
       "dtype: float64"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def f2(sex):\n",
    "    return sex.age.mean()\n",
    "user_info_sex_group.apply(f2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看不同性别下收入的最小值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "female    6000\n",
       "male      3000\n",
       "Name: income, dtype: int64"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group['income'].apply(lambda x: x.min())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 列表返回值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看不同性别下每个人的年龄与最小值之差"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom          0\n",
       "Bob       5000\n",
       "Mary      2000\n",
       "James     1000\n",
       "Andy         0\n",
       "Alice     1000\n",
       "Kobe      7000\n",
       "Yafei    67000\n",
       "Name: income, dtype: int64"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group['income'].apply(lambda x: x - x.min())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看不同性别下每个人的年龄和收入与最小值之差"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>income</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>5000.0</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>2000.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alice</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kobe</th>\n",
       "      <td>7000.0</td>\n",
       "      <td>19.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>67000.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        income   age\n",
       "name                \n",
       "Tom        0.0   0.0\n",
       "Bob     5000.0  12.0\n",
       "Mary    2000.0   5.0\n",
       "James   1000.0   0.0\n",
       "Andy       0.0   NaN\n",
       "Alice   1000.0   0.0\n",
       "Kobe    7000.0  19.0\n",
       "Yafei  67000.0   7.0"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group[['income', 'age']].apply(lambda x: x - x.min())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### DataFrame返回值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age_diff_max</th>\n",
       "      <th>age_diff_min</th>\n",
       "      <th>income_diff_max</th>\n",
       "      <th>income_diff_min</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>-19.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-67000</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>-7.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>-62000</td>\n",
       "      <td>5000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>0.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>-19.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-66000</td>\n",
       "      <td>1000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Andy</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-2000</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age_diff_max  age_diff_min  income_diff_max  income_diff_min\n",
       "name                                                               \n",
       "Tom           -19.0           0.0           -67000                0\n",
       "Bob            -7.0          12.0           -62000             5000\n",
       "Mary            0.0           5.0                0             2000\n",
       "James         -19.0           0.0           -66000             1000\n",
       "Andy            NaN           NaN            -2000                0"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_sex_group.apply(lambda x:pd.DataFrame({'age_diff_max':x['age']-x['age'].max(),\n",
    "                                  'age_diff_min':x['age']-x['age'].min(),\n",
    "                                  'income_diff_max':x['income']-x['income'].max(),\n",
    "                                  'income_diff_min':x['income']-x['income'].min()})).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 统计多个指标"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>income_sum</th>\n",
       "      <th>income_var</th>\n",
       "      <th>income_mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>21000.0</td>\n",
       "      <td>1000000.0</td>\n",
       "      <td>7000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>95000.0</td>\n",
       "      <td>821000000.0</td>\n",
       "      <td>19000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        income_sum   income_var  income_mean\n",
       "sex                                         \n",
       "female     21000.0    1000000.0       7000.0\n",
       "male       95000.0  821000000.0      19000.0"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def f(df):\n",
    "    data = {}\n",
    "    data['income_sum'] = df['income'].sum()\n",
    "    data['income_var'] = df['income'].var()\n",
    "    data['income_mean'] = df['income'].mean()\n",
    "    return pd.Series(data)\n",
    "user_info_sex_group.apply(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### agg、transform和apply的效率比较\n",
    "参考博客：https://www.cnblogs.com/wkang/p/9794678.html\n",
    "分别计算在同样简单需求下各组合方法的计算时长\n",
    "- transform() 方法+自定义函数\n",
    "- transform() 方法+python内置方法\n",
    "- apply() 方法+自定义函数\n",
    "- agg() 方法+自定义函数\n",
    "- agg() 方法+python内置方法\n",
    "\n",
    "最后得出结论：\n",
    "> agg+python内置方法 > transform+python内置方法 > agg+自定义函数 >= apply+自定义函数 > transform() 方法+自定义函数\n",
    "\n",
    "根据此结论，如果我们对分组结果的操作内置函数可以搞定，就用agg或者transform方法，如果内置函数搞不定，需要自定制方法，那么用agg或者apply是最佳选择。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第五部分：考试成绩统计\n",
    "考试成绩如下：\n",
    "![](https://zhangyafei-1258643511.cos.ap-nanjing.myqcloud.com/Python/blog/pandas-index-3.png)\n",
    "要求：\n",
    "1. 计算每个学生的总成绩\n",
    "\n",
    "2. 计算每个学生各学期的总成绩\n",
    "\n",
    "3. 各门课程平均成绩\n",
    "\n",
    "4. 各学期大于本课程平均成绩的学生姓名及成绩\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "整理数据\n",
    "![](https://zhangyafei-1258643511.cos.ap-nanjing.myqcloud.com/Python/blog/pandas-index-4.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一步：读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>姓名</th>\n",
       "      <th>课程</th>\n",
       "      <th>学期</th>\n",
       "      <th>成绩</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>1</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>2</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>3</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>4</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>高等数学</td>\n",
       "      <td>1</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    姓名    课程  学期  成绩\n",
       "0  王大伟  大学英语   1  92\n",
       "1  王大伟  大学英语   2  85\n",
       "2  王大伟  大学英语   3  83\n",
       "3  王大伟  大学英语   4  90\n",
       "4  王大伟  高等数学   1  91"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "exam_data = pd.read_excel('data/scores.xlsx')\n",
    "exam_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 计算每个学生总成绩"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1. 学生总成绩：\n",
      "       总成绩\n",
      "姓名       \n",
      "孙力   1038\n",
      "张明   1081\n",
      "王大伟  1048\n"
     ]
    }
   ],
   "source": [
    "student_total_score = exam_data.groupby(by=['姓名']).agg(\n",
    "    {'成绩': sum}).rename(columns={'成绩': '总成绩'})\n",
    "print('1. 学生总成绩：\\n', student_total_score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 每个学生各学期的总成绩"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "2. 学生每个学期总成绩：\n",
      "          成绩\n",
      "姓名  学期     \n",
      "孙力  1   251\n",
      "    2   255\n",
      "    3   277\n",
      "    4   255\n",
      "张明  1   272\n",
      "    2   268\n",
      "    3   276\n",
      "    4   265\n",
      "王大伟 1   261\n",
      "    2   262\n",
      "    3   261\n",
      "    4   264\n"
     ]
    }
   ],
   "source": [
    "student_semester_total = exam_data.groupby(by=['姓名', '学期']).agg(\n",
    "    {'成绩': sum}).rename({'成绩': ' 总成绩'})\n",
    "print('\\n2. 学生每个学期总成绩：\\n', student_semester_total)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. 各门课程平均成绩"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "3. 各门课程平均成绩：\n",
      " 课程\n",
      "大学体育    86.333333\n",
      "大学英语    87.666667\n",
      "高等数学    89.916667\n",
      "Name: 成绩, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "course_avg_score = exam_data.groupby(by=['课程'])['成绩'].mean()\n",
    "print('\\n3. 各门课程平均成绩：\\n', course_avg_score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. 各学期大于本课程平均成绩的学生姓名及成绩"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "4. 各学期大于本课程平均成绩的学生姓名及成绩: \n",
      "           学期  成绩\n",
      "姓名  课程          \n",
      "王大伟 大学英语   1  92\n",
      "    大学英语   4  90\n",
      "    高等数学   1  91\n",
      "    高等数学   3  98\n",
      "    大学体育   2  91\n",
      "    大学体育   4  90\n",
      "孙力  大学英语   3  93\n",
      "    高等数学   2  93\n",
      "    大学体育   3  99\n",
      "    大学体育   4  88\n",
      "张明  大学英语   1  88\n",
      "    大学英语   2  94\n",
      "    大学英语   3  96\n",
      "    高等数学   1  97\n",
      "    高等数学   3  94\n",
      "    大学体育   1  87\n",
      "    大学体育   4  92\n"
     ]
    }
   ],
   "source": [
    "def judge_score(row):\n",
    "    return row['成绩'] > course_avg_score[row['课程']]\n",
    "greater_than_avg_student = exam_data[exam_data.apply(judge_score, axis=1)].set_index(keys=['姓名', '课程'])\n",
    "print('\\n4. 各学期大于本课程平均成绩的学生姓名及成绩: \\n', greater_than_avg_student)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. 将结果输出到文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 输出文件\n",
    "# with pd.ExcelWriter(path=\"结果.xlsx\") as writer:\n",
    "#     exam_data.to_excel(excel_writer=writer, sheet_name='试题数据', encoding='utf-8', index=False)\n",
    "#     student_total_score.to_excel(excel_writer=writer, sheet_name='学生总成绩', encoding='utf-8')\n",
    "#     student_semester_total.to_excel(excel_writer=writer, sheet_name='每个学生各学期总成绩', encoding='utf-8')\n",
    "#     course_avg_score.to_excel(excel_writer=writer, sheet_name='各门课程平均成绩', encoding='utf-8')\n",
    "#     greater_than_avg_student.to_excel(excel_writer=writer, sheet_name='各学期大于本课程平均成绩的学生姓名及成绩', encoding='utf-8')\n",
    "#     writer.save()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
